server : add Hermes-3 tool call support (WIP) #9254

ngxson · 2024-08-30T16:11:52Z

Related to #5695
Close #9031

This is still WIP.

What is working:

support Hermes-3 model
❌ support the official Meta-Llama-3.1: doesn't work because the model is too sensitive to the input prompt. Hopefully Meta will fix it soon.
auto detect template to be used
support tools via /chat/completion
detect if model wants to use tool or not
support stream ==> currently works for non-tool response
dynamic temperature and/or grammar when generating tool call response
add demo
add tests

Special thanks to @Rocketknight1 for his very detailed blog post: Tool Use, Unified

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

qnixsynapse · 2024-08-30T16:38:41Z

I have a suggestion. You can grab tool call action by tool token ids instead of token string.

In my own "Python implementation", if start tool call token id is generated(<|python_tag|> in case of llama 3.1) then streaming is paused until stop token/end_tool_call token is generated. Then a recursive call is made with the output of the tool and streaming is resumed.

ngxson · 2024-08-30T19:03:23Z

@qnixsynapse Yes it's possible to do so with Hermes-3 format, but that will not be possible with either llama 3.1 JSON tool calls or llama 3.1 custom function. The goal here is to make it compatible with OAI specs, so relying on <|python_tag|> is not an option here (it may make more sense with llama-cpp-python)

Anyway, I'll consider doing this later on, when tool call templates are more mainstream and patterns start to emerge.

qnixsynapse · 2024-08-31T03:32:23Z

@ngxson The <|python_tag|> was just an example. You can also add support for newer mistral models(7B v3; Nemo) which has [TOOL_CALL] tokens for example.

Here for example, we can expand this:

else if (has_token("[/INST]") && has_token("[TOOL_CALLS]")) {
        return LLAMA_TOOL_FORMAT_MISTRAL;
 }

Regarding streaming, this is sufficient I think:

upload.mp4

mario7421 · 2024-09-19T18:59:31Z

@ngxson, I appreciate your work, the new feature is great.
I've encountered an issue, though. When the conversation includes previous model messages with a "null" content (which can happen with tool_call only messages), I receive an "Invalid 'content' type" error on subsequent requests. I believe a minor adjustment to the parse_chat_messages function in /examples/server/utils.hpp should solve this.
For now, I've worked around the problem by manually assigning an empty string to "content" in the code that makes the calls to the server.

EDIT: Also the cases in which the "tools" parameter in the /completion request is null, or an empty array, should be accepted and treated the same as the case in which such parameter does not exist at all.

mario7421 · 2024-09-19T20:16:48Z

examples/server/tool-call.hpp

+        ss << "<|im_start|>system\n\n";
+        ss << "You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: <tools>\n\n";
+        for (auto tool : tools) {
+            ss << tool.dump(1, '\t') << "\n\n";


Why the tabulations? They are increasing the number of tokens, but I think they do not provide useful information.

Suggested change

ss << tool.dump(1, '\t') << "\n\n";

ss << tool.dump() << "\n\n";

server : add Hermes-3 tool call support

7e017cf

ngxson added the server label Aug 30, 2024

github-actions bot added the examples label Aug 30, 2024

ngxson added 2 commits August 30, 2024 21:40

add --tool-call argument

5f06d37

refactor

d25cd7f

mario7421 reviewed Sep 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server : add Hermes-3 tool call support (WIP) #9254

server : add Hermes-3 tool call support (WIP) #9254

ngxson commented Aug 30, 2024 •

edited

Loading

qnixsynapse commented Aug 30, 2024 •

edited

Loading

ngxson commented Aug 30, 2024 •

edited

Loading

qnixsynapse commented Aug 31, 2024 •

edited

Loading

mario7421 commented Sep 19, 2024 •

edited

Loading

mario7421 Sep 19, 2024

	ss << tool.dump(1, '\t') << "\n\n";
	ss << tool.dump() << "\n\n";

server : add Hermes-3 tool call support (WIP) #9254

Are you sure you want to change the base?

server : add Hermes-3 tool call support (WIP) #9254

Conversation

ngxson commented Aug 30, 2024 • edited Loading

qnixsynapse commented Aug 30, 2024 • edited Loading

ngxson commented Aug 30, 2024 • edited Loading

qnixsynapse commented Aug 31, 2024 • edited Loading

mario7421 commented Sep 19, 2024 • edited Loading

mario7421 Sep 19, 2024

Choose a reason for hiding this comment

ngxson commented Aug 30, 2024 •

edited

Loading

qnixsynapse commented Aug 30, 2024 •

edited

Loading

ngxson commented Aug 30, 2024 •

edited

Loading

qnixsynapse commented Aug 31, 2024 •

edited

Loading

mario7421 commented Sep 19, 2024 •

edited

Loading